Goto

Collaborating Authors

 ar 2


BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy

arXiv.org Machine Learning

Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confident the model is in its estimates, which can limit their reliability in real-world decision making. To address these challenges, we propose BVFLMSP, a Bayesian Vertical Federated Learning (VFL) framework for multimodal time-to-event analysis based on a Split Neural Network architecture. In BVFLMSP, each client independently models a specific data modality using a Bayesian neural network, while a central server aggregates intermediate representations to perform survival risk prediction. To enhance privacy, we integrate differential privacy mechanisms by perturbing client side representations before transmission, providing formal privacy guarantees against information leakage during federated training. We first evaluate our Bayesian multimodal survival model against widely used single modality survival baselines and the centralized multimodal baseline MultiSurv. Across multimodal settings, the proposed method shows consistent improvements in discrimination performance, with up to 0.02 higher C-index compared to MultiSurv. We then compare federated and centralized learning under varying privacy budgets across different modality combinations, highlighting the tradeoff between predictive performance and privacy. Experimental results show that BVFLMSP effectively includes multimodal data, improves survival prediction over existing baselines, and remains robust under strict privacy constraints while providing uncertainty estimates.


AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models

arXiv.org Artificial Intelligence

Abstraction--the ability to recognize and distill essential computational patterns from complex problem statements--is a foundational skill in computer science, critical both for human problem-solvers and coding-oriented large language models (LLMs). Despite recent advances in training LLMs for code generation using reinforcement learning (RL), most existing approaches focus primarily on superficial pattern recognition, overlooking explicit training for abstraction. In this study, we propose AR$^2$ (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly designed to enhance the abstraction abilities of LLMs. AR$^2$ employs a teacher model to transform kernel problems into narrative-rich, challenging descriptions without changing their fundamental logic. Simultaneously, a student coding model is trained to solve these complex narrative problems by extracting their underlying computational kernels. Experimental results demonstrate that AR$^2$ substantially improves the student model's accuracy on previously unseen, challenging programming tasks, underscoring abstraction as a key skill for enhancing LLM generalization.


Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

arXiv.org Machine Learning

The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.


Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable Interactions

arXiv.org Machine Learning

We propose a new method for data visualization based on attraction-repulsion swarming (ARS) dynamics, which we call ARS visualization. ARS is a generalized framework that is based on viewing the t-distributed stochastic neighbor embedding (t-SNE) visualization technique as a swarm of interacting agents driven by attraction and repulsion. Motivated by recent developments in swarming, we modify the t-SNE dynamics to include a normalization by the \emph{total influence}, which results in better posed dynamics in which we can use a data size independent time step (of $h=1$) and a simple iteration, without the need for the array of optimization tricks employed in t-SNE. ARS also includes the ability to separately tune the attraction and repulsion kernels, which gives the user control over the tightness within clusters and the spacing between them in the visualization. In contrast with t-SNE, our proposed ARS data visualization method is not gradient descent on the Kullback-Leibler divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces. We provide theoretical results illustrating how the choice of interaction kernel affects the dynamics, and experimental results to validate our method and compare to t-SNE on the MNIST and Cifar-10 data sets.


Midjourney is testing a new photo-realistic version

#artificialintelligence

What could be more exciting for an AI artist to wake up early in the morning and see a new announcement in the Midjourney discord server about further testing again? Shortly after the public release of Stable Diffusion, Midjourney brought its photo-realistic beta version to the market just a few days ago. However, after a testing phase of only 24 hours, David and his team decided to turn this beta version off and make some improvements. Today it is back again with some new options. For the next 24-48 hours, depending on the user behaviors (guys, please, we all know what "ladies in bikinis" look like), we can create new, more cohesive AI images.


Non-asymptotic bounds for sampling algorithms without log-concavity

arXiv.org Machine Learning

Discrete time analogues of ergodic stochastic differential equations (SDEs) are one of the most popular and flexible tools for sampling high-dimensional probability measures. Non-asymptotic analysis in the $L^2$ Wasserstein distance of sampling algorithms based on Euler discretisations of SDEs has been recently developed by several authors for log-concave probability distributions. In this work we replace the log-concavity assumption with a log-concavity at infinity condition. We provide novel $L^2$ convergence rates for Euler schemes, expressed explicitly in terms of problem parameters. From there we derive non-asymptotic bounds on the distance between the laws induced by Euler schemes and the invariant laws of SDEs, both for schemes with standard and with randomised (inaccurate) drifts. We also obtain bounds for the hierarchy of discretisation, which enables us to deploy a multi-level Monte Carlo estimator. Our proof relies on a novel construction of a coupling for the Markov chains that can be used to control both the $L^1$ and $L^2$ Wasserstein distances simultaneously. Finally, we provide a weak convergence analysis that covers both the standard and the randomised (inaccurate) drift case. In particular, we reveal that the variance of the randomised drift does not influence the rate of weak convergence of the Euler scheme to the SDE.


A Framework to Adjust Dependency Measure Estimates for Chance

arXiv.org Machine Learning

Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple and applicable to any dependency measure, are helpful in improving interpretability when quantifying dependency and in improving accuracy on the task of ranking dependencies. In particular, we demonstrate that our approach enhances the interpretability of MIC when used as a proxy for the amount of noise between variables, and to gain accuracy when ranking variables during the splitting procedure in random forests.